feat(retry): 429 rate-limit retry + multi-provider integration validation by aksOps · Pull Request #10 · RandomCodeSpace/asr

aksOps · 2026-05-14T16:22:45Z

Summary

v1.5-C follow-up. Three improvements caught while live-validating the per-agent provider story (intake on Ollama, downstream on OpenRouter):

feat(retry) — Free / shared upstream tiers (e.g. OpenRouter …:free models) throttle on 30-60s windows. The existing 5xx backoff (1.5s/3s/4.5s) exhausts retries before the window clears, surfacing the 429 as EnvelopeMissingError or agent failed. Added a separate _RATE_LIMIT_MARKERS set + longer rate_limit_base_delay (7.5s/15s/22.5s, total ~45s).
test(integration) — tests/test_integration_driver_s1.py was written in the Phase 15 (response_format JSON) era; its responder skill prompt missed the Phase 22 markdown contract → live Ollama call hard-failed with EnvelopeMissingError. Added the contract block to the prompt. Also added an azure parametrize arm so the live verification covers all three production provider kinds. Per-leg skip semantics — partial-key environments now exercise whichever providers they can reach.
chore(config) — Switch llm.default to workhorse and point workhorse at inclusionai/ring-2.6-1t:free. Demonstrates the v1.5-C per-agent flow with two real providers in the same INC. Operators on a paid OpenRouter plan should swap back to a paid model.

Changes

Commit	What
`c638352`	`_ainvoke_with_retry` two-regime backoff + 5 new tests
`c8da236`	S1 driver: markdown contract in prompt + Azure leg + per-leg skip
`7d29cf0`	`config/config.yaml` default → free OpenRouter model
`1df2072`	dist regeneration for the retry change

Test plan

uv run ruff check src/ tests/ — clean
uv run pytest -x — 1265 passed, 8 skipped (was 1260, added 5)
tests/test_integration_driver_s1.py::…[local] — PASSES end-to-end against Ollama Cloud gpt-oss:20b
dist bundles regenerated

Live verification matrix (with this dev environment's `.env`)

Leg	Result	Reason if not green
`local` (Ollama Cloud)	✅ pass	n/a
`workhorse` (OpenRouter free model)	⚠️ rate-limited	the new 429 retry is what this PR adds to make multi-call sessions reliable
`azure`	⚠️ Connection error	`.env` has placeholder `AZURE_ENDPOINT='noop…'`; framework path itself constructs `AzureChatOpenAI` cleanly

🤖 Generated with Claude Code

Free / shared upstream tiers (e.g. OpenRouter ``…:free`` models) throttle on short windows that need 30-60s to clear. The existing 5xx backoff (1.5s/3s/4.5s, total ~9s) exhausts retries before the window opens again, surfacing the 429 as an EnvelopeMissingError or a hard ``agent failed`` row. Split ``_ainvoke_with_retry`` into two backoff regimes: * 5xx + connection-reset markers: existing ``base_delay`` (1.5s) → 1.5s / 3.0s / 4.5s * 429 / rate-limit markers: new ``rate_limit_base_delay`` (7.5s) → 7.5s / 15.0s / 22.5s (total ~45s before raising) ``_RATE_LIMIT_MARKERS`` covers the variants real providers emit: ``status code: 429``, ``error code: 429``, the bare ``" 429"`` / ``"429 "`` (with space-guard against false positives like 1429), ``ratelimiterror`` (langchain's exception class name), ``rate limit`` / ``rate-limited``, and ``too many requests``. Non-429 4xx (401 unauthorized, 422 schema validation, etc.) keep their fast-fail behaviour — retrying a quota / auth / schema error just wastes time and masks the real problem. 5 new tests in ``tests/test_ainvoke_retry_429.py``: * ``test_retries_on_5xx_and_returns_eventually`` — pins the short-backoff path stays at 1.5s. * ``test_retries_on_429_with_longer_backoff`` — pins the 7.5s/15s progression. * ``test_429_phrasings_all_match`` — exercises every marker. * ``test_non_transient_error_propagates_without_retry`` — fast-fail on 401. * ``test_429_exhausts_max_attempts_then_raises`` — bounded retry, no infinite loop. Suite: 1265 passed (was 1260 — added 5), ruff clean.

Two issues caught while live-validating v1.5-C against real providers: 1. **Stale skill prompt.** The S1 driver's ``responder`` skill was written in the Phase 15 (response_format JSON) era; its system_prompt told the LLM "respond in one sentence" with no markdown contract instructions. Phase 22 (markdown-primary turn output) made that fail with ``EnvelopeMissingError`` because the parser has nothing to lift. Add the ``## Response`` / ``## Confidence`` / ``## Signal`` contract block to the prompt — same pattern as the production skill prompts under ``examples/incident_management/skills/*/system.md``. 2. **No Azure parametrize arm.** The driver covered ``workhorse`` (OpenRouter) + ``local`` (Ollama). Azure has been first-class in ``runtime.llm.get_llm`` since Phase 13 but had no live verification path. Add an ``azure`` arm parametrize that constructs an ``AzureChatOpenAI`` from ``AZURE_OPENAI_KEY`` + ``AZURE_ENDPOINT`` + ``AZURE_DEPLOYMENT`` (defaults to ``gpt-4o``). Per-leg skip semantics: each arm independently skips when its keys are absent. Replaces the global ``pytestmark.skipif`` that required ALL three keys for any leg to run — partial-key environments now exercise whichever providers they can reach. Drops the ``_OPENROUTER_KEY and _OLLAMA_KEY and _OLLAMA_BASE_URL`` global gate; the per-leg gate inside the test body owns it. The ``LLMConfig`` builder also handles a fully-keyless environment by falling through to a stub provider so config validation passes during test collection. Live verification status (with the keys in this dev environment): * ``local`` — PASSES against Ollama Cloud gpt-oss:20b * ``workhorse`` — fails on credit / rate-limit (account-specific) * ``azure`` — fails on connection error (placeholder endpoint in .env; framework path itself is intact)

…6-1t:free Demonstrates the v1.5-C per-agent provider story end-to-end with two REAL providers in flight: * intake (skill override) → Ollama Cloud gpt-oss:20b * triage / DI / resolution (default) → OpenRouter inclusionai/ring-2.6-1t:free The free OpenRouter tier rate-limits aggressively; the preceding ``feat(retry)`` commit's 429 backoff (7.5s/15s/22.5s) keeps multi-agent INC runs working through transient throttles. Operators on a paid OpenRouter plan should swap the model back to ``openai/gpt-4o-mini`` (or any other paid model) — the rest of the registry is unchanged.

Bundles dist/app.py + dist/apps/{code-review,incident-management}.py in line with the ``runtime.graph._RATE_LIMIT_MARKERS`` + ``_ainvoke_with_retry`` rate-limit branch from the preceding feat commit. No bundle-only edits.

sonarqubecloud · 2026-05-14T16:26:56Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

aksOps added 4 commits May 14, 2026 16:19

build: regenerate dist for 429 retry support

1df2072

Bundles dist/app.py + dist/apps/{code-review,incident-management}.py in line with the ``runtime.graph._RATE_LIMIT_MARKERS`` + ``_ainvoke_with_retry`` rate-limit branch from the preceding feat commit. No bundle-only edits.

aksOps merged commit adefae6 into main May 14, 2026
8 checks passed

aksOps deleted the feat/rate-limit-retry-and-multi-provider-validation branch May 14, 2026 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retry): 429 rate-limit retry + multi-provider integration validation#10

feat(retry): 429 rate-limit retry + multi-provider integration validation#10
aksOps merged 4 commits into
mainfrom
feat/rate-limit-retry-and-multi-provider-validation

aksOps commented May 14, 2026

Uh oh!

sonarqubecloud Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aksOps commented May 14, 2026

Summary

Changes

Test plan

Live verification matrix (with this dev environment's .env)

Uh oh!

sonarqubecloud Bot commented May 14, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Live verification matrix (with this dev environment's `.env`)